Improving Text Retrieval Accuracy Using a Graph of Terms
نویسندگان
چکیده
It has been demonstrated that a way to increase the number of relevant documents returned by an informational query performed on a Web repository is to expand the original query with additional knowledge, for instance coded through other topic-related terms. In this paper we propose a new technique to build automatically, through the probabilistic topic model and given a small set of documents on a topic, the expansion of a query based on a mixed Graph of Terms (mGT ) representation composed of two levels: the conceptual level, a set of interconnected terms representing concepts (undirected edges), and the word level composed of the cloud of interconnected words specifying a concept (directed edges). A mGT can be automatically learnt from a small set of documents through two learning stages and thanks to the probabilistic topic model. We have evaluated the performance through a comparison between our searching methodology and a classic one which considers the query expansion formed of only the list of concepts and words composing the graph and so where relations have not been considered. The results obtained show that our system, independently of the topic, is able to retrieve more relevant web pages.
منابع مشابه
Improving Precision of Keywords Extracted From Persian Text Using Word2Vec Algorithm
Keywords can present the main concepts of the text without human intervention according to the model. Keywords are important vocabulary words that describe the text and play a very important role in accurate and fast understanding of the content. The purpose of extracting keywords is to identify the subject of the text and the main content of the text in the shortest time. Keyword extraction pl...
متن کاملAn Optimal Approach to Local and Global Text Coherence Evaluation Combining Entity-based, Graph-based and Entropy-based Approaches
Text coherence evaluation becomes a vital and lovely task in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging with nouns and noun phrases change role in sequential sentences within short part of a text. They even have limitations in global coheren...
متن کاملبررسی نقش انواع بافتار همنویسهها در تعیین شباهت بین مدارک
Aim: Automatic information retrieval is based on the assumption that texts contain content or structural elements that can be used in word sense disambiguation and thereby improving the effectiveness of the results retrieved. Homographs are among the words requiring sense disambiguation. Depending on their roles and positions in texts, homograph contexts could be divided to different types, wit...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملSemiautomatic Image Retrieval Using the High Level Semantic Labels
Content-based image retrieval and text-based image retrieval are two fundamental approaches in the field of image retrieval. The challenges related to each of these approaches, guide the researchers to use combining approaches and semi-automatic retrieval using the user interaction in the retrieval cycle. Hence, in this paper, an image retrieval system is introduced that provided two kind of qu...
متن کامل